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Abstract. We address the statistical issue of determining the 
maximal spaces (maxisets) where model selection procedures at- 
tain a given rate of convergence. By considering first general dic- 
tionaries, then orthonormal bases, we characterize these maxisets 
in terms of approximation spaces. These results are illustrated by 
classical choices of wavelet model collections. For each of them, 
the maxisets are described in terms of functional spaces. We take 
a special care of the issue of calculability and measure the induced 
loss of performance in terms of maxisets. 
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1. Introduction 

The topic of this paper lies on the frontier between statistics and 
approximation theory. Our goal is to characterize the functions well 
estimated by a special class of estimation procedures: the model selec- 
tion rules. Our purpose is not to build new model selection estimators 
but to determine thoroughly the functions for which well known model 
selection procedures achieve good performances. Of course, approxi- 
mation theory plays a crucial role in our setting but surprisingly its role 
is even more important than the one of statistical tools. This statement 
will be emphasized by the use of the maxiset approach, which illustrates 
the well known fact that "well estimating is well approximating" . 
More precisely we consider the classical Gaussian white noise model 

alY n t = s(t)dt + —^dW t , t E V, 

where T> C JR., s is the unknown function, W is the Brownian motion in 
M and n e N* = {1, 2, . . . , }. This model means that for any u G L^P), 

Y n {u)= I u(t)dY ntt = I u(t)s(t)dt+ -^=W U 
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is observable where W u = J v u(t)dW t is a centered Gaussian process 
such that for all functions u and u', 



We take a noise level of the form l/\/n to refer to the asymptotic 
equivalence between the Gaussian white noise model and the classical 
regression model with n equispaced observations (see [26J). 
Two questions naturally arise: how to construct an estimator s of s 
based on the observation dY n j and how to measure its performance? 
Many estimators have been proposed in this setting (wavelet thresh- 
olding, kernel rules, Bayesian procedures...). In this paper, we only 
focus on model selection techniques described accurately in the next 
paragraph. 

1.1. Model selection procedures. The model selection methodol- 
ogy consists in constructing an estimator by minimizing an empirical 
contrast j n over a given set, called a model. The pioneer work in 
model selection goes back in the 1970's with Mallows [20] and Akaike 
[TJ . Birge and Massart develop the whole modern theory of model selec- 
tion in [U[T0l[lT] or [7] for instance. Estimation of a regression function 
with model selection estimators is considered by Baraud in [HIE], while 
inverse problems are tackled by Loubes and Ludena [18, ;TS]. Finally 
model selection techniques provide nowadays valuable tools in statisti- 
cal learning (see Boucheron et al. [T2"]). 

In nonparametric estimation, performances of estimators are usually 
measured by using the quadratic norm, which gives rise to the following 
empirical quadratic contrast 



for any function u, where || • || denotes the norm associated to h 2 (T>). We 
assume that we are given a dictionary of functions of L 2 (D), denoted by 
$ = (^i)iez where 1 is a countable set and we consider A4 n , a collection 
of models spanned by some functions of For any m G Ai n , we denote 
by X m the subset of X such that 



and D m < \T m \ the dimension of m. Let s m be the function that 
minimizes the quadratic empirical criterion 7 n (w) with respect to u G 
m. A straightforward computation shows that the estimator s m is the 
projection of the data onto the space m. So, if {e™, . . . ,eT> m } is an 




Iniu) 



2Y n (u) + \\u 



m = span{<£>j : i G X m } 
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orthonormal basis (not necessarily related to $) of m and 

P? = Y n {e?) = [ eT(t)dY n ,t 



then 



v 

m\2 



s m =J2P?eT, and 7n (S m ) = -^(^ 

Now, the issue is the selection of the best model rh from the data which 
gives rise to the model selection estimator Sm- For this purpose, a pe- 
nalized rule is considered, which aims at selecting an estimator, close 
enough to the data, but still lying in a small space to avoid overfit- 
ting issues. Let pen n (m) be a penalty function which increases when 
D m increases. The model rh is selected using the following penalized 
criterion 

(1.1) rh = arg min {7„(s m ) + pen n (m)} . 

m£Mn 

The choice of the model collection and the associated penalty are then 
the key issues handled by model selection theory. We point out that the 
choices of both the model collection and the penalty function should 
depend on the noise level. This is emphasized by the subscript n for 
A4 n and pen„(m). 

The asymptotic behavior of model selection estimators has been stud- 
ied by many authors. We refer to Massart [2T] for general references 
and recall hereafter the main oracle type inequality. Such an oracle 
inequality provides a non asymptotic control on the estimation error 
with respect to a bias term \\s — s m \\, where s m stands for the best 
approximation (in the L2 sense) of the function s by a function of m. 
In other words s m is the orthogonal projection of s onto m, defined by 

Sm = P?eT, P? = [ et{t)s{t)dt. 

Theorem 1 (Theorem 4.2 of [2T]). Letn G N* be fixed and let (x m ) m£ M„ 
be some family of positive numbers such that 

(1.2) 2J exp(-x m ) = S n < 00. 
Let k > 1 and assume that 

(1.3) pen n (m) > - (v^A^+ V2x m ) • 

Then, almost surely, there exists some minimizer rh of the penalized 
least-squares criterion 

) +pen n (m) 
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over m G M. n - Moreover, the corresponding penalized least-squares 
estimator s r ~ n is unique and the following inequality is valid: 

(1.4) E[p*-s|| 2 ] <C 

where C depends only on k. 

Equation (jl.4p is the key result to establish optimality of penalized 
estimators under oracle or minimax points of view. In this paper, we 
focus on an alternative to these approaches: the maxiset point of view. 

1.2. The maxiset point of view. Before describing the maxiset ap- 
proach, let us briefly recall that for a given procedure s* = (s*) n , the 
minimax study of s* consists in comparing the rate of convergence of 
s* achieved on a given functional space T with the best possible rate 
achieved by any estimator. More precisely, let TiK) be the ball of ra- 
dius R associated with J 7 , the procedure s* = (s* ) n achieves the rate 
p* = (p* n ) n on T{R) if 

supi(p;)- 2 sup E [|K - S || 2 ] l<oo. 

n y s^r(R) J 

To check that a procedure is optimal from the minimax point of view 
(said to be minimax), it must be proved that its rate of convergence 
achieves the best rate among any procedure on each ball of the class. 
This minimax approach is extensively used and many methods cited 
above are proved to be minimax in different statistical frameworks. 
However, the choice of the function class is subjective and, in the min- 
imax framework, statisticians have no idea whether there are other 
functions well estimated at the rate p* by their procedure. A different 
point of view is to consider the procedure s* as given and search all 
the functions s that are well estimated at a given rate p*: this is the 
maxiset approach, which has been proposed by Kerkyacharian and Pi- 
card [T7]. The maximal space, or maxiset, of the procedure s* for this 
rate p* is defined as the set of all these functions. Obviously, the larger 
the maxiset, the better the procedure. We set the following definition. 

Definition 1. Let p* = (p* n ) n be a decreasing sequence of positive real 
numbers and let s* = (s* ) n be an estimation procedure. The maxiset of 
s* associated with the rate p* is 

MS(s*, p*) — Is E U(P) : sup {( P ;)- 2 E - s\\ 2 ]} < oo 
L n 



inf 

m£M r 



\s m ~ s\\ 2 + pen n (m)} + 



n 



MAXISETS FOR MODEL SELECTION 



5 



the ball of radius R > of the maxiset is defined by 
MS(s*,p*)(R) = { S G L 2 (P) : sup {( P ;)- 2 E [||< - s|| 2 ] } < i? 2 } . 

Of course, there exist connections between maxiset and minimax points 
of view: s* achieves the rate p* on JF if and only if 

FcMS(s*,p*). 

In the white noise setting, the maxiset theory has been investigated for 
a wide range of estimation procedures, including kernel, thresholding 
and Lepski procedures, Bayesian or linear rules. We refer to [3J, [3], 
[E], [13], [TT| , [23], and [23] for general results. Maxisets have also been 
investigated for other statistical models, see [2J and [25J. 



1.3. Overview of the paper. The goal of this paper is to investigate 
maxisets of model selection procedures. Following the classical model 
selection literature, we only use penalties proportional to the dimension 
D m of m: 

(1.5) pen n (m) = —D m , 

n 

with A n to be specified. Our main result characterizes these maxisets in 
terms of approximation spaces. More precisely, we establish an equiva- 
lence between the statistical performance of and the approximation 
properties of the model collections M. n . With 

( a 
X n \ 1+2a 
/ 

for any a > 0, Theorem [2J, combined with Theorem [1] proves that, for 
a given function s, the quadratic risk E[||s — Sm|| 2 ] decays at the rate 
p 2 a if and only if the deterministic quantity 

(1.7) Q(s,n) = inf { \\s m - s\\ 2 + — D r , 

decays at the rate p 2 Q as well. This result holds with mild assumptions 
on X n and under an embedding assumption on the model collections 
(Ai n C M. n+ i). Once we impose additional structure on the model 
collections, the deterministic condition can be rephrased as a linear 
approximation property and a non linear one as stated in Theorem [3J 
We illustrate these results for three different model collections based on 
wavelet bases. The first one deals with sieves in which all the models are 
embedded, the second one with the collection of all subspaces spanned 
by vectors of a given basis. For these examples, we handle the issue of 
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calculability and give explicit characterizations of the maxisets. In the 
third example, we provide an intermediate choice of model collections 
and use the fact that the embedding condition on the model collections 
can be relaxed. Finally performances of these estimators are compared 
and discussed. 

The paper is organized as follows. Section [2] describes the main gen- 
eral results established in this paper. More precisely, we specify results 
valid for general dictionaries in Section I2TT1 In Section I2"72"l we focus on 
the case where $ is an orthonormal family. Section [3] is devoted to the 
illustrations of these results for some model selection estimators asso- 
ciated with wavelet methods. In particular, a comparison of maxiset 
performances are provided and discussed. Section 0] gives the proofs of 
our results. 

2. Main results 

As explained in the introduction, our goal is to investigate maxisets 
associated with model selection estimators where the penalty func- 
tion is defined in (11.51) and with the rate p a = (p n)a ) n where p U)0l is 
specified in (11.61) . Observe that p n>a depends on the choice of A n . It 
can be for instance polynomial, or can take the classical form 

logrA 1+2a 

So we wish to determine 

MS(s^ p a ) = \se L 2 (P) : sup {p- 2 Q E - sf] } < oo 

In the sequel, we use the following notation: if T is a given space 

MS(srn,p a ) :=: T 
means that for any R > 0, there exists R' > such that 

(2.1) MS(Sm, Pa)(R) C F(R') 

and for any R' > 0, there exists R > such that 

(2.2) T{R!)^MS{srn,p a ){R). 

2.1. The case of general dictionaries. In this section, we make 
no assumption on $. Theorem [1] is a non asymptotic result while 
maxisets results deal with rates of convergence (with asymptotics in 
n). Therefore obtaining maxiset results for model selection estimators 
requires a structure on the sequence of model collections. We first 
focus on the case of nested model collections (M. n C M. n+ \). Note 
that this does not imply a strong structure on the model collection 
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for a given n. In particular, this does not imply that the models are 
nested. Identifying the maxiset MS(s m , p a ) is a two-step procedure. 
We need to establish inclusion (12. 1 p and inclusion (12. 2p . Recall that we 
have introduced previously 



Q(s,n)= inf { \\s m - s\\ 2 + —D m 



meMn n 



Roughly speaking, Theorem [T] established by Massart proves that any 
function s satisfying 



sup {p- 2 a Q(s,n)} < (#) 2 

n 

belongs to the maxiset MS(s m , p a ) and thus provides inclusion (12. 2p . 
The following theorem establishes inclusion (12.11) and highlights that 
Q(s, n) plays a capital role. 

Theorem 2. Let < «o < oo be fixed. Let us assume that the sequence 
of model collections satisfies for any n 

(2.3) M n C M n+ i, 

and that the sequence of positive numbers (X n ) n is non-decreasing and 
satisfies 

(2.4) lim n~ l \ n = 0, 

and there exist n G N* and two constants < 5 < | and < p < 1 
such that for n> n , 

(2.5) A 2n < 2(1 - 8)\ n , 

(2.6) J2 e 3 ^ 

meM n 

and 

(2.7) A no >T(5,p,a ), 

where T(5, p, a ) is a positive constant only depending on ao, p and S 
defined in Equation ( |^.3[ ) of Section 4- Then, the penalized rule s m is 
such that for any a G (0, ao], for any R > 0, there exists R' > such 
that for s G L 2 (Z>) , 

sup {p- 2 a E [\\s m - s\\ 2 ] }<R 2 ^ sup {p~lQ(s,n)} < (R') 2 . 



s 
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Technical Assumptions (12. 4p . (12.51) . (12.61) and ( 12.7ft are very mild and 
could be partly relaxed while preserving the results. Assumption (12.41) 
is necessary to deal with rates converging to 0. Note that the clas- 
sical cases A n = A or A n = A log(n) satisfy (12.41) and (12.51) . Fur- 
thermore, Assumption (12.71) is always satisfied when \ n = A log(n) or 
when A n = Ao with Ao large enough. Assumption (12. 6p is very close 
to Assumptions (ll.2l) - (ll.3p . In particular, if there exist two constants 
k, > 1 and < p < 1 such that for any n, 

(2.8) 2^ e 2 <V 1 ~P 

meM n 

then, since 

pen n (m) = — D m , 
n 

Conditions (ll.2p . (I1.3P and (12. 6p are all satisfied. The assumption a G 
(0, ao] can be relaxed for particular model collections, which will be 
highlighted in Proposition [2] of Section 13.11 Finally, Assumption (12.31) 
can be removed for some special choice of model collection M. n at the 
price of a slight overpenalization as it shall be shown in Proposition [1] 
and Section [331 

Combining Theorems[T]and[2]gives a first characterization of the maxiset 
of the model selection procedure s^: 

Corollary 1. Let «o < oo be fixed. Assume that Assumptions Ii2.3\) . 
^2.J^ , h2. 5\) ?] ) and h2.§) are satisfied. Then for any a G (0, a ], 

MS(Srh,Pa) ■ = ■ js GL 2 (D) : SUp {Pn 2 a Q(s,n)} < OO j . 

The maxiset of is characterized by a deterministic approximation 
property of s with respect to the models Ai n - It can be related to some 
classical approximation properties of s in terms of approximation rates 
if the functions of $ are orthonormal. 

2.2. The case of orthonormal bases. From now on, $ = {<^i}iez is 

assumed to be an orthonormal basis (for the L 2 scalar product). We 
also assume that the model collections M. n are constructed through 
restrictions of a single model collection Ai. Namely, given a collection 
of models M. we introduce a sequence J n of increasing subsets of the 
indices set I and we define the intermediate collection J\4' n as 

(2.9) M' n = {m = span{<^ : i G X m D J n } : m G M}. 
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The model collections M! n do not necessarily satisfy the embedding 
condition (12.31) . Thus, we define 

M n = |J M' k 

k<n 

so M. n c M.n+i- The assumptions on $ and on the model collections 
allow to give an explicit characterization of the maxisets. We denote 
M. = U n Ai n = U n Ai' n . Remark that without any further assumption 
M. can be a larger model collection than M.. Now, let us denote by 
V — {Yn)n the sequence of approximation spaces defined by 

V n = span{v9j : i G J n } 

and consider the corresponding approximation space 

£y = |s G UiV) : sup {p^ a \\P Vn s - s\\} < oo J , 

where -Py„ s is the projection of s onto V n . Define also another kind of 
approximation sets: 



A a _ = { s e L 2 (P) : sup<M Q Jnf \\s m - s\\ } < oo 

M>0 I {m6X:D m <Af} 



.VI 



The corresponding balls of radius R > are defined, as usual, by 
replacing oo by R in the previous definitions. We have the following 
result. 

Theorem 3. Let a < oo be fixed. Assume that [iLfy , / TO)) . P?.7fl 
and |H3) ore satisfied. Then, the penalized rule Sm satisfies the follow- 
ing result: for any a G (0, ao], 

The result pointed out in Theorem [3] links the performance of the 
estimator to an approximation property for the estimated function. 
This approximation property is decomposed into a linear approxima- 
tion measured by Cy and a non linear approximation measured by A°L. 
The linear condition is due to the use of the reduced model collection 
M. n instead of A4, which is often necessary to ensure either the cal- 
culability of the estimator or Condition (12. 8p . It plays the role of a 
minimum regularity property that is easily satisfied. 
Observe that if we have one model collection, that is for any k and k', 
M k = Mk' = M, J n =T for any n and thus M = M. Then 

C v = span {ifi : i G X} 
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and Theorem [3] gives 

MS(s^,p a ):=:A a M . 

The spaces A°L and C v highly depend on the models and the approxi- 

mation space. At first glance, the best choice seems to be V n = L 2 (X>) 
and 

M = {m : I m Cl} 
since the infimum in the definition of A°L becomes smaller when the 

M 

collection is enriched. There is however a price to pay when enlarging 
the model collection: the penalty has to be larger to satisfy (12. 8p . 
which deteriorates the convergence rate. A second issue comes from 
the tractability of the minimization (11 .11) itself which will further limit 
the size of the model collection. 

To avoid considering the union of Ai' k , that can dramatically increase 
the number of models considered for a fixed n, leading to large penal- 
ties, we can relax the assumption that the penalty is proportional to the 
dimension. Namely, for any n, for any m £ M.' n , there exists m £ A4 
such that 

m = span {ipi : % £ H J n } . 

Then for any model m £ A4' n , we replace the dimension D m by the 
larger dimension and we set 

pen n (m) = —D^. 



n 



The minimization of the corresponding penalized criterion over all 
model in M! n leads to a result similar to Theorem [3J Mimicking its 
proof, we can state the following proposition that will be used in Sec- 
tion 



Proposition 1. Let a < oo be fixed. Assume {2.1$ , ( f£.5|) {2. 7| j 



and l[2.8\) are satisfied. Then, the penalized estimator Sfh where 
m = arg min {^ n (s m ) + pen n (m)} 

meM' n 

satisfies the following result: for any a £ (0, a ], 

MS(s^,p a ) :=:A a M nC v . 

Remark that Ai n , C v and A°L can be defined in a similar fashion for 
any arbitrary dictionary $. However, one can only obtain the inclusion 
MS(§m, Pa) C A°L fl CJy in the general case. 
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3. Comparisons of model selection estimators 

The aim of this section is twofold. Firstly, we propose to illustrate our 
previous maxiset results to different model selection estimators built 
with wavelet methods by identifying precisely the spaces A°L and C v . 
Secondly, comparisons between the performances of these estimators 
are provided and discussed. 

We briefly recall the construction of periodic wavelets bases of the 
interval [0, 1]. Let <fi and ip be two compactly supported functions of 
L.2(K) and denote for all j G N, all k G Z and all x G R, <pjk{x) = 
2 J/2 0(2 J x — k) and vpjk(x) = 2 J/2 ip{2 3 x — k). Those functions can be 
periodized in such a way that 

* = {000,^*: 3>0, ke {0,...,2 J -1}} 

constitutes an orthonormal basis of L2QO, 1]). Some popular examples 
of such bases are given in [15J. The function <p is called the scaling 
function and ip the corresponding wavelet. Any periodic function s G 
L 2 ([0, 1]) can be represented as: 

00 2-?'-l 

s = aoo^oo + ^ Pjk^jk 

j=0 k=0 

where 

"oo = / s(t)4> 00 (t)dt 

J [0,1] 

and for any j G N and for any k G {0, . . . , 2 J 1 — 1} 

Pjk = / s(t)ifj jk (t)dt. 
J[o,i] 

Finally, we recall the characterization of Besov spaces using wavelets. 
Such spaces will play an important role in the following. In this section 
we assume that the multiresolution analysis associated with the basis 
\1/ is r-regular with r > 1 as defined in [22J. In this case, for any 
< a < r and any 1 < p,q < 00, the periodic function s belongs to 
the Besov space B^ q if and only if |a 00 | < 00 and 

00 

^2^4)11^,1^ < 00 ifg<oo, 
3=0 

sup2- ?(a+ ^p ) ||/3 J .||^ < 00 if q = 00 
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where (/3j.) = (/3jk)k- This characterization allows to recall the following 
embeddings: 



3.1. Collection of Sieves. We consider first a single model collection 
corresponding to a class of nested models 

M {s) = {m = span{0 oo , : j < N m , < k < 2^'} : N m G N}. 

For such a model collection, Theorem [3] could be applied with V n = L 2 . 
One can even remove Assumption (12. 7p which imposes a minimum 
value on A no that depends on the rate p a : 

Proposition 2. Let < a < r and let be the model selection 
estimator associated with the model collection Ai^ s \ Then, under As- 
sumptions (f23D, flZSD and f[2H . 



Remark that it suffices to choose A n > A with A , independent of a, 
large enough to ensure Condition (12. 8p . 

It is important to notice that the estimator cannot be computed in 
practice because to determine the best model rh one needs to consider 
an infinite number of models, which cannot be done without comput- 
ing an infinite number of wavelet coefficients. To overcome this is- 
sue, we specify a maximum resolution level jo(n) for estimation where 
n i — > jo{n) is non- decreasing. This modification is also in the scope of 
Theorem [3j it corresponds to 

V n = span{0 oo , ip jk ■ < j < j (ri), < k < 2 J } 
and the model collection Ain defined as follows: 



n £ $5 „/ as soon as a > a — 



— , p < p and q < q 
p' 



and 



Bp,oo S ^2,00 as soon as p > 2. 



MS(i$,p a ) :=: Bl 




M' n {s) = {m G M {s) : N m <j (n)}. 



For the specific choice 



(3.1) 



2 Jo(n) < n\~ l < 2 jo(n)+1 , 
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we obtain: 

oo 2-7-1 

Er — ^ 2j'q {n)a „ 

> /Wi* e L 2 : sup 2 1+2- ||s - P Vn s|| < oo} 

j=0 fc=0 
oo 2J-1 

= {s = a oo 0oo + /Wj* e L 2 : sup 2 i+a* ^ 2_^p jk <oo} 

3=0 k=0 neN * J>io(n) k 

a 

_ K?l + 2a 

— °2,oo ' 

Since i^oo* ^ ^2 oo reduces to £> 2 oo, arguments of the proofs of Theo- 
rem [3] and Proposition [2] give: 

Proposition 3. Let < a < r and let be the model selection 

(s) 

estimator associated with the model collection Ai n ■ Then, under As- 
sumptions (E3D, (E2D and (EHD 

MS(s%\p a ) :=:BS )OB . 

This tractable procedure is thus as efficient as the original one. We ob- 
tain the maxiset behavior of the non adaptive linear wavelet procedure 
pointed out in [23] but here the procedure is completely data-driven. 

3.2. The largest model collections. In this paragraph we enlarge 
the model collections in order to obtain much larger maxisets. We start 
with the following model collection 

M® = {m = span{0 oo , j> jk : (j, k) G l rn } : X m G V(l)} 

where 

X=\J{(j,k): fce{0,l,...,2'-l}} 

j>o 

and V(T) is the set of all subsets of X. This model collection is so 
rich that whatever the sequence (A n ) n , Condition (12. 8p (or even Condi- 
tion (11.21) ) is not satisfied. To reduce the cardinality of the collection, 
we restrict the maximum resolution level to the resolution level jo(n) 
defined in (13. ip and consider the collections Mn defined from M.® by 

M® = M' n {l) = {m G M {1) : l m G V(V )} 

where 

2*= (J {{j,k): ke {0,1,..., 2*-l}}. 

o<j<io(") 

Remark that this corresponds to the same choice of V n as in the pre- 
vious paragraph and the corresponding estimator fits perfectly within 
the framework of Theorem [3j 
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The classical logarithmic penalty 

A \og(n)D m 



pen n (m) 



n 



which corresponds to A n = Aolog(n), is sufficient to ensure Condition 
(12.81) as soon as A is a constant large enough (the choice A n = A is 
not sufficient). The identification of the corresponding maxiset focuses 
on the characterization of the space A^^ since, as previously, C v = 

a 

02,oo a • We re ly on sparsity properties of A°^ m . In our context, sparsity 
means that there is a small proportion of large coefficients of a signal. 
Let introduce for, for nef, the notation 

|/%) = inf{«: card{(j,fc) GNx {0, 1, . . . , 2 J - 1} : \(3 jk \ > u) < n) 

to represent the non-increasing rearrangement of the wavelet coefficient 
of a periodic signal s: 

|/% } >!/%)>••• >!%)>-••. 

As the best model m G MS 1 ' of prescribed dimension M is obtained by 
choosing the subset of index corresponding to the M largest wavelet 
coefficients, a simple identification of the space A°^ m is 

oo 2 J — 1 oo 

A MW = { s = «oo0oo + Y] Y] Pjk^Pjk e L 2 : sup M 2a V |/3|L < 
' U t^o M6N * wi 

Theorem 2.1 of [17] provides a characterization of this space as a weak 
Besov space: 

A a M(l) = W , 



l + 2a 

with for any q e]0, 2[, 

{oo 2^-1 
s = a oo 0oo + ^ (3jk4>jk e L 2 : sup n 1/q \f3\ {n) < oo 

Following their definitions, the larger a, the smaller g = 2/(1 + 2a) 
and the sparser the sequence (Pjk)j,k- Lemma 2.2 of [T7J shows that 
the spaces W q (0 < q < 2) have other characterizations in terms of 
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wavelet coefficients: 



oo 2-J-l 



W q = I s = a oo 0oo + ^2 Pi^jk G L2 : SU P M<? 2 $Z5Z^i fcl 



i=0 fc=0 " >0 j k 



\Pik 



-o 2^-1 

'00 



+ XI 2 G L2 : sup uq J2J2 x im>« < 



i=0 fc=0 U>0 j k 



We obtain thus the following proposition. 

Proposition 4. Let cto < r be fixed, let < a < «o ond Zei sj^ 6e i/ie 
model selection estimator associated with the model collection M$ • 
Then, under Assumptions fl23j) . fl2TTD and (j^gjl : 

MS (4 } ,p a ) :=: Bj^nW^.. 

Observe that the estimator is easily tractable from a computational 
point of view as the minimization can be rewritten coefficientwise: 



mini 



argmin mgA1 ( j 7 „(s m ) + ^A*} 

j=0 fc=0 ^ ' 



The best subset is thus the set {{j, k) G X J0 : \Pjk\ > a/ A n /n} 
and s2* corresponds to the well-known hard thresholding estimator, 



Jo(n)-l 2^-1 

*S? = «00</>00 

' - ' - l/3,-fcl> 



j=0 fc=0 

Proposition H] corresponds thus to the maxiset result established by 
Kerkyacharian and Picard[l7J. 

3.3. A special strategy for Besov spaces. We consider now the 
model collection proposed by Massart [2T]. This collection can be 
viewed as an hybrid collection between the collections of Sections 13.11 
and 13.21 This strategy turns out to be minimax for all Besov spaces 
£>p )00 when a > max(l/p — 1/2, 0) and 1 < p < oo. 
More precisely, for a chosen 9 > 2, define the model collection by 

M {h) = {m = span{0oo, V»ifc : (J, k) e l m } : J e N, l m G Vj(l)}, 
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where for any J G N, Vj(T) is the set of all subsets X m of X that can 
be written 

I m ={(j,k): 0<j<J,0<k<2 j } 

U U,->j {(j, fe) : fc G Ay, |A,-| = [2 J (j - J + l)- e J } 
with L^cJ := maxjn G N : n < x}. 

As remarked in [2Tj , for any J G N and any X m G (X) , the dimension 
-D m of the corresponding model m depends only on J and is such that 



2 J < D m < 2 




We denote by Dj this common dimension. Note that the model col- 
lection Ji4^ does not vary with n. Using Theorem [3] with V n = L2, we 
have the following proposition. 

Proposition 5. Let a < r be fixed, let < a < «o °> n d let be the 
model selection estimator associated with the model collection M.^ h \ 
Then, under Assumptions ((211) . fl23|) . (TO) and 



s = aoo^oo + E E ^jfc^i* G 1,2 : 

i>0 fc=0 



- i>Jfc>|2 J 0'-.7+i)- e J 



■il(fc) < 00 



where (\Pj\(k))k is the reordered sequence of coefficients (Pjk)k : 
\Pj\(i) > \Pj\(2) ■ ■ ■ \(3j\(k) >■> \Pj\(-ny 



Remark that, as in Section I3.1[ as soon as A n > A with A large 
enough, Condition (12. 8p holds. 

This large set cannot be characterized in terms of classical spaces. Nev- 
ertheless it is undoubtedly a large functional space, since as proved in 
Section FOl for every a > and every p > 1 satisfying p > 2/ {2a + 1) 
we get 

(3-2) £>p jOC) C A a MW . 

This new procedure is not computable since one needs an infinite num- 
ber of wavelet coefficients to perform it. The problem of calculability 
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can be solved by introducing, as previously, a maximum scale jo(n) as 
defined in (13.11) . We consider the class of collection models {M.n )n 
defined as follows: 

M$ = {m = span{0oo, *Pjk ■ (j, k) G X m ,j < jo(n)} : 

JeN,l m eVj(l)}. 



This model collection does not satisfy the embedding condition C 
M-n+x- Nevertheless, we can use Proposition [1] with 

Pehn( m ) = — D J 
n 

if m is obtained from an index subset 1 m in Vj(I). This slight over- 
penalization leads to the following result. 

Proposition 6. Let a < r be fixed, let < a < a and let be the 
model selection estimator associated with the model collection Ain . 
Then, under Assumptions (ET31) . (123]) . (1X71) and f[2T8|) .- 



MS (sg*\p a ) :=: BjjtnA 



Modifying Massart's strategy in order to obtain a practical estimator 
changes the maxiset performance. The previous set A a is inter- 

sected with the strong Besov space B^^ 20 ^ . Nevertheless, as it will 
be proved in Section 14.41 the maxiset MS is still a large 

functional space. Indeed, for every a > and every p satisfying 
p>max(l,2( T ^ + 2«)" 1 ) 



(3-3) ££ i00 C B^nA a Mih) . 

3.4. Comparisons of model selection estimators. In this para- 
graph, we compare the maxiset performances of the different model 
selection procedures described previously. For a chosen rate of conver- 
gence let us recall that the larger the maxiset, the better the estimator. 
To begin, we propose to focus on the model selection estimators which 
are tractable from the computational point of view. Gathering Propo- 
sitions [3j H] and [6] we obtain the following comparison. 

Proposition 7. Let < a < r. 

- If for every n, X n = Xq log(ra) with Ao large enough, then 

(3.4) MS(^\p a ) C MS{sf\p a ) C MS^^Pa). 



18 
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- If for every n, X n = Ao with Ao large enough, then 
(3.5) MS(s^\p a )CMS(s^\p a ). 

It means the followings. 

- If for every n, X n = A log(n) with A large enough, then, ac- 
cording to the maxiset point of view, the estimator strictly 
outperforms the estimator s~f^ which strictly outperforms the 



estimator 



m 

(st) 



- If for every n, X n = A or X n = A log(n) with A large enough, 
then, according to the maxiset point of view, the estimator 
strictly outperforms the estimator ' . 

The corresponding embeddings of functional spaces are proved in Sec- 
tion 14.41 The hard thresholding estimator appears as the best esti- 
mator when X n grows logarithmically while estimator is the best 
estimator when X n is constant. In both cases, those estimators perform 

a 

very well since their maxiset contains all the Besov spaces Bp^T with 
p>max(l, (i^ + 2a) _1 ). 

We forget now the calculability issues and consider the maxiset of the 
original procedure proposed by Massart. Propositions HI [5] and [6] lead 
then to the following result. 

Proposition 8. Let < a < r. 

- If for any n, X n = A log(n) with A large enough then 

(3.6) 

MS(s%\p a )ZMS{s$,p a ) and MS(s$, p a ) <£ MS(s% ] , p a/ 



- If for any n, X n = A or X n = A log(ra) with A large enough 
then 

(3.7) MS(s^\p a )CMS(s^\ Pa ). 

Hence, within the maxiset framework, the estimator strictly out- 
performs the estimator while the estimators and are not 
comparable. Note that we did not consider the maxisets of the estima- 
tor in this section as they are identical to the ones of the tractable 

estimator . We summarize all those embeddings in Figured] and 
Figure [2j Figure [1] represents these maxiset embeddings for the choice 
X n = Aolog(n), while Figure [2] represents these maxiset embeddings for 
the choice A„ = A . 
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Figure 1. Maxiset embeddings when \ n = A log(n) 
and max(l, 2 + 2a)' 1 ) <p<2. 




FIGURE 2. Maxiset embeddings when A n = Ao and 
max(l,2( T ^ + 2«)" 1 )<p<2. 



4. Proofs 

For any functions u and u' of L 2 (£>), we denote by (u,u') the L 
scalar product between u and u': 

(u,u') = / u(t)u'{t)dt. 
Jv 

We denote by C a constant whose value may change at each line. 
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4.1. Proof of Theorem [2l Without loss of generality, we assume that 
Uq — 1. We start by constructing a different representation of the white 
noise model. For any model m, we define W m , the projection of the 
noise on m by 

W m = V W em e?, W em = / e?(t)dW t , 
i=i Jr ° 

where {e™ - }^™ is any orthonormal basis of m. For any function s G m, 
we have : 

n Dm 

W s = / s(t)dW t = Y^(s,e™)W eT = (W mi s). 

The key observation is now that with high probability, ||W m || 2 can 
be controlled simultaneously over all models. More precisely, for any 
m,m' G M-n-, we define the space m + m! as the space spanned by the 
functions of m and m' and control the norm of ||W m+m /|| 2 . 

Lemma 1. Let n be fixed and 

A n = I sup sup {(D m + D m iy l \\W m+m ,\\ 2 } <X n \. 

{meMnm'eMn ) 

Then, under Assumption 112. 6]) . we have F{A n } > p. 

Proof. The Cirelson-Ibragimov-Sudakov inequality (see [2Tj . page 10) 
implies that for any t > 0, any m G M n and any w! G M n 



P{||W mW || 

— E [|| Wm+m' II] +0 < e"~. 

Since 

E[||W m+m /||] < v/E[||W m+m ,|| 2 ] < y/D m + D m ,, 

with t = ^ X n (D m + D m i) — \J D m + D m i , we obtain 

(v^n"-l) 2 (Dm + -D /) 

P{||W mW || 2 > X n {D m + D m ,)} < e s — . 

Assumption (12.61) implies thus that 



1-P{4J< E P{l|W m+m ,|| 2 > X n (D m + D'J} 

meMn m'eMn 

(V^-l) 2 (Bm + D m j) 



<- E E «- 



2 

\2 



< 



e a j < 1 - p. 

meMn 
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We define m (n) (denoted m when there is no ambiguity), the model 
that minimizes a quantity close to Q(s,n): 

mo(ra) = argmm meMn |||s m - s|| 2 + ^AnJ , 

where K is an absolute constant larger than 1 specified later. The 
proof of the theorem begins by a bound on \\s r , ' " " 



Lemma 2. For any < 7 < 1, 
(4.1) 

J V #p{a,} k ; 

i/ the constant K = K(l — 7) — 27 _1 — 1 satisfies K > 0. 
Proof. By definition, 

Da D 



- KP{A n ) l " 1 \ KV{A n } K ) Kn 



mo 



n n 
Thus, 

Dfo - An (1 < /g N_ ^ N 
/v n _ /7iV°™o/ in\°m) 

n 

— 2Y ra (s m() ) + ||s mo || + 2Y n ( y S- r f l ) 



m || 

2 

2 , O/S „\ II A 112 j 

2 

mo °|| || a m J \\ I ~/ = ' v Sm, — s ri 



< -2{s mo ,s) + \\s m J 2 + 2(s m , fi) - ||s m || 2 + -r=W Sjh _j 



- |- s //m " s \\ \\Sjh s \\ + ^p^" s ™~* T - 



Let < 7 < 1. As Sf n — s mo is supported by the space m + mo spanned 
by the functions of m and m , we obtain with the previous definition 

Dfh - Dm {) < I, g _ ,|2 _ || g _ _ 1 1 2 , JL/W- s _ 3 \ 

— ll^mn II IPm °\\ < , — \ v¥ m+ran> m °Tnn/ 
n y/n 

< US _ oil 2 _ ||£. _ oil 2 _l lllW- II 2 

— IPwi-o °m °|| ~r || » m+mn 1 1 

n 

+ ~ (Pmo - S f + Pm - S|| 2 ) 

< (~ + x ) P m <> ~~ S H 2 + (7 ~ X ) ~ s " 2 + ^ll Writ + m '> 
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We multiply now by 1a„ to obtain 



A»U. — n Dm ° < ^ + lj Ujs mo -4 2 +{^~ l) UjSr 

+ lA„-||W A+m || 2 . 

n 



Using now the definition of A n and Lemma (TJ it yields 



\nU D * 1 < (~ + 1 W P mn - S\\ 2 + (- - 1 1 1 , ll.s 



■>L 1„ ^ I - til J-Ajpmo - *ll t I - - l ] X .1.,, ||*,„ 

n 



and thus 



(1 " 7) KU Drh Dm ° <(- + l)lA n \\s mo - sf 
n \7 



+ ( ^-1 ) lA„p A -sf + 2 7 A n l 



One obtains 



D-D -+ 1 - - 1 

n J-^m ^mo ^7 i II ~ ||2,7 -i || ~ 

^ni-A n 1 A n \\Sm — S\\ + J. 4 \\S f 

(4.2) n 

2j D mo 

+ -, *nlA n • 

1 — 7 n 
We derive now a bound on ||s TOo — s|| 2 . By definition, 

II - II 2 _l_ \ D' m <- \\ _ ||2 _|_ \ -^"t 

An An 

and thus 

He _ oil 2 < He . _ «|| 2 _L \ — ~ "^ m ° 

An 
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By multiplying by 1a„ an d plugging the bound ( 14.21) . we have: 
i II II 2 <r ~\ II II 2 _l \ i ~ 

J-A„|Pmo s \\ — -M„|| s m s \\ i ^n^-An 

f + 1 

< 1a„ || s m — s || + _ ^ 1a„ \\S mo — S || 

, 7^ 1 ||" 1 1 2 i 2 7 ^ 1 -Pmp 

+ K(l- 7 ) AJ|SA_S|1 ^(1-7) "~ 
^(l + ^^y) Uj|«*-ll a 

+ ^|^yl^(|| Smo - S || 2 + ^||^ mo f) 
, 2 7 x , Ano 



K(l- 7 ) " " n n 

K(l--y)J" ^(1-7) 

i 7 II W |2 i Z 7 ^ -, ^mp 

+ W^T)n WmJ + ir(i- 7 ) AnlA "^r 

and thus 



2 , ! 
(1 -~ ) 



1-T775 r IaJI^-sII 2 



< 1 1 + .1 1 ; 1 ||.s (I( -.si' 2 



7 +1 1 || T y 112 1 2 7 , 1 °mo 



Taking the expectation on both sides yields 
f + 1 ' 



1 ~ 1 , )F{.4„}||.s ()) „ -.s| 



2 1 



<n+^ r ^iEni^-,ii 2 i 



7 + 1 2 7 TOr/1 \ 



+ ITT, V + E^i n ^{A,}A. 



K(l-7) K(l- 7 ) J / n 
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and thus as soon as 1 — K Ji_^ > 

2_1 

1 + 



l^o - all 2 < 7 E [II** - S H 2 ] 

P{AJ 



§+i 

^(1-7) 



5(1-7) + ^(1-7) P {^™}^« -D mo 



2 -+i \ . , n 



1 " ifcj ) P{4J 



K(l -7) + ^ - 1 

(k(i - 7) - 2 - - 1) F{ A,} 

+ ? + 1 + ^nAn}Xn D mo 



(k(i - 7) - f - 1) P{A.} 



n 



^+7 .r„. „2i f + l + 2 7 P{A n }A njD 



KF{A n } J in»{A n } 



which yields 



with X = X(l - 7) - 2 - 1. ■ 

Now, let us specify the constants. We take 

g{5, Oq)= inf inf (s^+i - si = (1-5)^ - 1 + 5 e (0,1). 

ae^.aolxgf^i-tf] I J 

Then we take 

1 - + 1 

7 = a o) and ^ = 7 • 

8 2-7 

This implies K = y and assumptions of the previous lemma are sat- 
isfied. We consider now the dependency of m on n and prove by 
induction the following lemma. 

Lemma 3. If there exists C\ > such that for any n, 



E [\\s M n/2) - S|| 2 ] < Cx 



2a 



n 
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then, provided Ai > Y(<5, p, «o), where 

there exists a constant C 2 such that for any n, 

( 2a 
A n \ 2a+1 
—) 

Proof. By using M. n /2 C and (14. ip . for any (3 G [0, 1], if we denote 
4 — 11 1 1 2 _. x ^ m o( n ) 

|Pmo(?i) % 1 1 t <A n ' 



we have 



A < ||s mo ( n / 2 ) — s\\ + A 



2 . x An„(n/2) 



A„. . 2Z^ mo ( n /2) 



< /?||s mo (n/2) - S|| 2 + (1 - /3)||s mo(n /2) - S|| 2 + ^T^A^/2 . 

if + ± 

< /? - - 7 E [p A (n/2) " sf] + (1 - P)\\s mo{n/2 ) - S|| 2 

+ ( ( K{2 ~ + 1) + ^l) + A n \ 2A n/2 D mo(w/2) 
^ ^P{A n/2 }A n/2 X J 2A n/2y ! 

As A n < 2A n /2, there exists (3 n G [0, 1] such that 



^P{A„ /2 }A n/2 K ) 2A n/2 



so that 



+ (1 - /?„) ||s mo ( n/2 ) - S\\ + 



2 2A n /2-D mo (n/2) 



The induction can now be started. We assume now that for all n' < 
n — 1 

2a 

I, || 2 . x D m (n>) n ( K'\ 2a+1 

\\S mo (n>) ~ S\\ +\ n > Rn , <C 2 

By assumption, 

2a 

2A„./o \ a»+i 



E[||S A(n/2) - S f] <Ci{^) 



26 

so that, 
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A < f3 n ^ 



KnK/2} 

k + ^ a 

So, we have to prove that 




KF{A n/2 }C 2 



+ 1 




< 1 



or equivalent ly. 
/ i\ ■■■ [ 

Pn 



7 



d if (~ + 1) 
+ — 



ifP{A n/2 }C 2 KP{A n/2 }A n/2 
This condition can be rewritten as 



+ 



2K-f 
K 



+ 



'2A 



n/2 



2A 



n/2 



A n 



< 1. 



ft 



+ — 



+ 



'2A 



n/2 



\kw>{A n/2 }C 2 k¥{A n/2 }\ n/2 k \ A 



2tj + l 



< 1 



A, 



2A 



n/2 



2a + l 



or 



A 



n/2 



if (Hi) 



AT{A„ /2 } 



A, 



Pn 



2A 



n/2 



A„ 



7 



d 2K 7 



2A 



n/2 



K¥{A n/2 } C 2 K 



-1 



provided the right member is positive. Under the very mild assumption 
2(1 — <5)A n / 2 > A n > A n /2, it is sufficient to ensure that (14.31) is true. 
Indeed, A n / 2 > Ai and using values of the constants we have 



< 



< 



< 



\n/2 
if(j + 1) 

knA n/2 } 

2(1+1 

V 



+ 1 



A, 



Pn 

9(5, «o) 



2A 



A„ 



7 



Ci 2K-i 



n/2 

j£+j cx 

fTp C 2 



2A 



n/2 



KP{A„ /2 } c 2 K 



-1 



-1 



a ) 



/ 16 



pg{5,a Q ) \g(S,a ) 



+ 1 



ifp 0(5, ao) 
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■ 

Finally, Theorem [2] follows from the previous lemma that gives the 
following inequality: 

Q(s,n) . r f n 112, n 

< mf < s — s m \\ + — — D 



K rneM„ [ Kn 

< \\ _ II 2 _i_ ^ n 11 

2a 

A \ 2a + 1 

< CA- 



2 

' n 



4.2. Proofs of Theorem [3] and Proposition [T]. Theorem [2] implies 
that for any s G MS(Sm, p a ), 

SU P {pn 2 aQ( S > n )} < 00 

n 

or equivalently there exists C > such that for any n, 

|2 i I ^ n„l 



(4-4) inf n Sm -s\\ 2 + ^D m \<Cpl )a . 

By definition of any function s m with m G .M ra belongs to 14 and 
thus Inequality (14. 4p implies 

(4-5) \\Pv n s-sf<Cpl a 

that is s G C v . By definition, .M is a larger collection than A4 n and 
thus Inequality (I4.4p also implies that for any n, 

mt{\\s m -s\\ 2 + ^D m )<Cp 2 na , 

a 

which turns out to be a characterization of A'— when p n ^ a = (^) 2a+1 
as a consequence of the following lemma. 

Lemma 4. Under Assumptions of Theorem^ 
(4.6) 



2a 

sup <( ( — J inM ||s m - s\\ 2 + —D m \ [> < oo ^ s G .4-. 



Proof. We denote 



A, 



m(n) = arg min <( II s m — sll 2 H — -D r 



3 m 
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First, let us assume that for any n 



A \ 2a +' L 

— 

n 

where d is a constant. Then, 

\ \ ~~ l+2a 



Drn{n) < Cl (^j 



Using A n < \ 2n < 2A n , for M G N*, as soon as M > d (A x ) 1 + 2a , 
there exists n e N* such that 

(4.7) d ( ^ ) < M < d ( ^ J < d2^= f — 
Then, 



n / V 2n / I ji 



_inf ||s m — s|| 2 < _inf \ \\s m — s|| 2 H -D 

{m£M:D m <M} {me.M: D m <M} 



m 

n 



< inf <! ||s m - sll 2 + — D r 



2a 



n / 

< C? a+1 2irfeM~ 2a . 
Conversely, assume that there exists d satisfying 

Jnf \\s m - s\\ 2 < C x M- 2a . 

{meM: D m <M} 

Then for any T > 0, 

mt{\\s m -s\\ 2 + T 2 D m } = inf Jnf {||s m - s|| 2 + T 2 M} 

m£M MGN * {m£M:D m =M} 



< inf {CiM- 2a + T 2 M\ 

M&i* I J 

< inf (dx- 2a + T 2 (x+l)l 



l + 2a 



\2aCj \\2ad. 



< d(T 2 ) T 



+ 2a 



where d is a constant. 
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We have proved so far that MS(s m , p a ) C C v fl •A c ~. It remains to 
prove the converse inclusion. Corollary [I] and the previous lemma im- 
ply that it suffices to prove that inequalities (14. 5 p and H4. 6f) imply in- 
equality (I4.4p (possibly with a different constant C). 
Let s G £y fl^4~. By inequality (I4.6p . for every n, there exists a model 

m E M. such that 

II s m - s|| 2 + — D m < Cp 2 n 
n 

By definition of Ai, there exists k such that m G -A/f fc . 
If k < n then m G .M n and thus 

inf \\\s m - s\\ 2 + —D m \ < Cp 2 n 

-meM n n 



a • 



Otherwise k > n and let m' & A4 be the model such that X m = T m i\~\Jk 
as defined in Section 12. 21 We define m" G M. n by its index set X m » = 
2m' fl J7^. Remark that m" C m and s m — s m » G V^~, so 

II ||2 7— \ || ||2 11 u2 ^ti 7— \ 
V — S H -^m" = V ~~ S m + s m _ s \\ H -^m" 



< ||Py„s - s|| 2 + ||s m - s|| 2 + — D m 

n 

< c P l Q 



Theorem [3] is proved. 

The proof of Proposition [1] relies on the definition of pen„(m). Recall 
that for any model m G AA' n there is a model fh G Ai such that 

m = span : z G I m fl J" n } 

and that 

Pen„(m) = — £> m - 
n 

One deduces 

11 1 1 o ,- — - / \ 11 1 1 q M 1 1 o 

p m -s + pen n (m) = \\s m - s\\ H Dm > s m - s H D m 

n n 

and thus 

inf {||s m - s|| 2 + p5fi n (m)} < Cp 2 n inf {||s m - s|| 2 + — D m \ < Cp 2 n 

Mimicking the proof of Theorem [3], one obtains Proposition [H 
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4.3. Proof of Proposition [21 In the same spirit as in the proof of 

Theorem [2], for any n, we denote 

(4.8) 

2 pen(m) \ . f 2 A n D m 



m (n) = arg min ^ II s m - s|| H > = arg min <^ \\s m - s„ 

m6M [ 4 J m£A4 [ 4n 

(we have set K = A) and 
(4.9) 

. f ii a ||2 i , M • f „. „2 . A n D 

m(n) = arg mm \ — \\s m \\ + pen(m) \ = arg mm < — \\s m \\ H 

meM 1 ' meM [ Tl 

In the nested case, Lemma [2] becomes the following much stronger 
lemma: 

Lemma 5. For any n, almost surely 

(4.10) ||^mo(fi) ^|| — ||^m(n) ^|| • 

Proof. As the models are embedded, either rh(n) C m (n) or m (n) C 
rh(n). 

In the first case, || s mo (n) — s l| 2 < || s m(n) — s l| 2 < pm(n) — s\\ 2 and thus 

( EEOJ) holds. 

Otherwise, by construction 

lc o II 2 I Vgmo(n) ^ |i i|2 , X nD A ( n ) 

5 mo(n) °\\ ~r 4 n _ || a m(n) a || i 4 n 

_IIS , ||2 I A "-°A(n) ^ || g 112 I 

|Pm(n)|| 1" n ^ |Pm (n)|| ' n 



and thus as mo(n) C m(n) 



|„ l|2 <T" ^^'M") A ™-CmQ(n) 

|»m(n)\rao(n) || ^ 4^ 4^ 

^'m(n)\mo(n 



jMg) _ A """'o(") <- II £ . , -.112 



Combining these two inequalities yields 

|| Sm(n)\mo(n) || ^ T || Srh(n)\rrao(n) 1 1 



— 2 ( II ^" 1 ( n )\ m o(") "5m(n)\?Tto(n) || "I - || "Sm(n)\mo(rj) || ) 



and thus 



|| ^m(n)\mo (n) || ^ || ^?fi(n.)\mo(n) ^m(n)\mo(n) II • 

Now, fl4TTUD holds as 

||^mo(n) ||^m(n) ^|| \\Srh(n)\mo(n)\\ 

— ||^m(n) ^|| || S'm(n)\rno(n) ^rh(n)\mo(n) || 

< \\ — II 2 4_ II " _ II 2 — II" _ II 2 

_i H'Sm(ri) *|| ~r || ^m(n) | ||^m(n) S\\ ■ 
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Now we can conclude the proof of Proposition El with an induction 
similar to the one used in the proof of Lemma [3j Indeed, let 



A- — |p mo (n) — s|| + 



A <T lie oil 2 I ^ D m (n/2) 

A S |pm (n/2) — S\\ + - 



2 ^n-^mo(") 



4n 



An 

< /3 n E(\\Srh{n/2) ~ S\\' 2 ) + (1 - (3n)\\s mo (n/2) ~ sf + 



i, , -, , >| II 2 ^n/2^m (n/2) 



2A n/2 4(n/2) 

The choice (3 n — 1 — - is such that 5 < /3„ < | and it implies 



, 2 A n /2-D mo (ra/2) 



A < /? n E(p A(n/2 ) - S|| ) + (1 - /?„) M|Sm (n/2) ~ s|| + |( ^ , } , 

Using now almost the same induction as in Theorem El we obtain 



n / \ n 

<(^)*(C?A.^ + d-A))Ci(^)* 



where C\ is a constant. It suffices thus to verify that 



2q 

(^f) (^l^ 1 + (1 - ^)) < 1, 



c 2 

which is the case as soon as C2 > 2g (Sa) ■ 



4.4. Space embeddings. In this paragraph we provide many embed- 
ding properties between the functional spaces considered in Section [3j 
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Let us recall the following definitions: 

= Ul 2 ([0,1]): snp2 J ( Q -^^|/3 jfc r<oo 



i i i > 



J6N fc=0 



2Jq 



^ = «6M[0,1]): sup2iT^^^^<oo 
^ w = |^L 2 ([0,1]): sup2 Wa £ £ < 00 



JeN j> J fc=L2" r (j'-J+l)- e J 



oo 2-7-1 

2 



= j^L 2 ([0,l]): y^EEV^f" 

u> i=o fc=o 



4.4.1. Space embeddings : part I. 



, , (0 («) 

# a c A a c w 2 . 



P>hP>TT2, 



Proof of (i). 

Let s belong to -Bp )OC with p > 1 and p > y^ 2 ^ and, for any scale j G N, 
let us denote by (\{3j\(k)) k the sequence of the non-decreasing reordered 
wavelet coefficients of any level j. Then there exists a non negative 
constant C such that for any j G N 



2-' 



I & I (*) ^ C2-^ a+1 / 2 - 1 ^. 



fc=l 

Fix J G N. If p < 2, according to Lemma 4.16 of [21J, for all j larger 
than J 

£ |/%|( fe) < C 2 ' p 2 -^(-+ 1 /2-i/ P ) ( [2 J (j _ j + i)^]) 1 - 2 ^ 

fc=L2J(j-J+l)-ej+l 

< C 2,p 2- 2Ja 2~ 2{j ~ J)( - a+1/2 ~ 1/p) (j - J+ l) e( - 2 /P~ 1 \ 
Summing over the indices j larger than J yields 

£ < C 2/p 2- 2Ja 2- 2j ' {a+1 l 2 - 1 l p \f + if^'P' 1 ) 

j>J k=[2J(j-J+l)~ e i j'>0 
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and thus 

sup 2 2Ja J2 E 1^1 (*) < C 2/p J2 2- 2 ^ a+1 / 2 - 1 M(j' + lf 2 /^ < oo. 

J -° j>J k=[2-'(j-J+l)-<>\ j'>0 

So s belongs to A^ w . 
For the case p = 2, 

E E iftiw £ E E £ E «- 2j " £ c t^- 

j>J k=l2 J (j-J+l)- B \ j>Jk=l j>J 

Thus 

sup2 2 ^^ ^ |&|? fc) <oo. 

JeN j>J k=\2J{j-J+l)-0\ 

So s also belongs to A a M(h) . 

We conclude that for any p > 1 satisfying p > yq^;, ^p,oo ^= ^mW- 
Let us now prove the strict inclusion by considering the function So 
defined as follows: 

2^-1 

s o = J]^ pjkipjk = E 2_v/7 ^.o- 

j>0 fc=0 j>0 

For any such that a' > max(i — |, 0) 

2^-1 
fc=0 

and thus goes to +oo when j goes to +oo. It implies that s does not 
belong to B*^ for any p > 

Now for any JeN, 

2 2Jq E E i/^) = 22jQ E 2 ~ 2V7 

j>J fc>L2- / (j-J+l)- e J j>mm{j'>J:2- / (j'-J+l)- 6 '<l} 

< 2 2ja 2_2v/J ' 

j>2 J / e +J 
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which implies 

su P 2 2 ^ jr \P s \! k) <oo 



J ~° 3>J k>[2J(j-J+l)-<>\ 



and thus s e A^^. Hence (i) is proved. 



Proof of (ii). 

There is no doubt that A a C W 2 since W 2 = „4 a . The strict 

jMCO l + 2a l + 2a Al(0 

inclusion is a direct consequence of (iz/), just below. ■ 



4.4.2. Space embeddings : part II. 

(Hi) a (il>) a 

U ^,00 c Cn^cs-n^. 

p>max(l, ~ 1 ) 

F - V ' (l + 2a)- 1 + 2a' 

Proof of (m). 

Let a > and p > 1 satisfying p > 2((1 + 2a)- 1 + 2a) _1 . Using the 
classical Besov embeddings B®^ Q an< ^' according to (i), we 

have C A a Hence C Bf^ n A a ^ and (m) is proved. 



Proof of (iv). 

We already know that Bf^ n A a C -Bo 1 ^ n W 2 .The strict 

17 2 >°° Alt' 1 ) ~~ ' T+2^ 

inclusion is a direct consequence of iyi) proved in the next subsection. 



4.4.3. A non-embedded case. 

(u) a a (vi) 

A a , , t Bit!- n W 2 and a^- n W 2 £ -4° , 

^(h) ^ ^ Alt' 1 ) 

Proof of (z/). 

Let us consider the function s €= , , defined in the proof of (i). We 

already know that it does not belong to B^ for any (a',p) satisfying 
a' > max(i — |,0). As a consequence for the case (a',p) = (x^,2) 
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where a > 0, we deduce that s does not belong to ■ 

a 

Moreover, we immediately deduce that A a <f. B\ + ^ fl W 2 . 
Proof of (vi). 

Let Si G L 2 ([0, 1]) whose wavelet expansion is given by 

00 2^-1 
j=0 k=0 

We set 




2 2 if k < 2 1+2« 
otherwise. 



We are going to prove that Si G S^t** fl W 2 while Si ^ .4.' 

1 l + 2a 

Summing at a given scale j yields 

2^-1 

^ = 2^2^' = 2"^ 



and thus Si G i^to" ■ 

Let < -u < 1 and j u the real number such that 2- ?u = u~ 2 . Then 



00 2^-1 



2-7-1 



U 1 



1 



2_ 

«l+2o 



j=0 fc=0 



Y Vi 

j<i u fc=o 

< 21^(21^ - 



2_ 

M l+2a 



So 



00 2^-1 
sup M 1+2^ ^ ^ 1 
j=0 k=0 



I0jfcl>' 



< OO 



and Si G W 2 . 

l+2a 
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Let us now prove that Si does not belong to A^ h) . Fix J G N large 
enough. Then 

j>Jfc=L2J(j-J+l)-»J 



Let J* be the real number such that 2!+ 2 « = ^ J<r _ ■ 

From J* = (2a + l)J-(2a + l)01og 2 (J*- J+l) one deduces thus J* < 

(2a + l)J, which implies J* > (2a + 1)J- (2a + l)#log 2 (2aJ + 1), and 

finally J* < (2a + l) J-(2a + l)#log 2 (2aJ+l-(2a + l)#iog 2 (2aJ+l)). 

So, 

j 



B f - Vf 9 j/{2o+1) - ^ 

> C 2~ 2J * a /( 2a+1 ^ 

> C {\og) 2ad 2~ 2Ja . 



So, 



■j\ 2 (k) = °°- 



- i>Jfe>L2- / (j-J+i)- , 'j 
This implies that si ^ ^JttW" Finally (i/i) is proved. ■ 
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